Dual-use scoring of patents based on semantic similarity between patent abstracts and items on the Wassenaar Arrangement (WA) control lists.

Proof of Concept

Data:

  1. 50,000 US patents granted between 2015–2020 (sourced from lens.org).

  2. Wassenaar Arrangement (WA) control list items for dual-use technologies (https://www.wassenaar.org/control-lists/). These lists define controlled categories used by Israeli arm export-control.

Method:

  1. Text representation of both patent abstracts and WA control-list text (item list by category).
  • Current baseline uses TF-IDF; a better approach is embeddings to better capture semantic similarity.
  1. Compute a patent-level ‘duality score’ using cosine similarity:
  • compute cosine similarity between each patent abstract and each WA text.

  • assign each patent the maximum similarity across WA texts.

Next step: select a classification threshold to flag likely dual-use patents.

Example:

Sample of 500 patents from the data, reporting for each patent: the duality_score, the best-matching WA item, and the matched text snippet (move mouse across bars).